Text Retrieval through Corrupted Queries

نویسندگان

  • Juan Otero Pombo
  • Jesús Vilares
  • Manuel Vilares Ferro
چکیده

Our work relies on the design and evaluation of experimental information retrieval systems able to cope with textual misspellings in queries. In contrast to previous proposals, commonly based on the consideration of spelling correction strategies and a word language model, we also report on the use of character n-grams as indexing support.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Experiments on Retrieval With Corrupted Data and Clean Queries in the TREC-4 Adhoc Task Environment: Data Fusion and Pattern Scanning

We report on several experiments in using data fusion to improve information retrieval, and in approximate text and 5-gram mathcing methods for retrieval of corrupted text, in the TREC context.

متن کامل

RMIT University at TREC 2008: Legal Track

This paper reports on the participation of RMIT university in the 2008 TREC Legal Track Ad Hoc task. OCR errors can corrupt the document view formed by an information retrieval system, and substantially hinder the successful retrieval of relevant documents for user queries. In previous research, the presence of errors in OCR text was observed to lead to unstable and unpredictable retrieval effe...

متن کامل

Report on the TREC-5 Confusion Track

For TREC retrieval from corrupted data was studied through retrieval of single target documents from a corpus which was corrupted by producing page images corrupting the bit maps and applying OCR techniques to the results In general methods which attempted a probabilistic estimation of the original clean text fare better than methods which simply accept corrupted versions of the query text

متن کامل

Consistency Learning and Multiple Rankings Combination for Text Retrieval

Text retrieval is one of the most basic tasks in the field of information retrieval. This paper deals with retrieving relevant documents for text-based queries from a database. Several different methods for retrieving text are explored, and show widely differing performance on different queries. It is shown how each of those methods may be improved through a “consistency learning” framework, wh...

متن کامل

Application-Embedded Retrieval from Distributed Free-Text Collections

A framework is presented for applicationembedded information retrieval from distributed free-text collections. An application’s usage is sampled by an embedded information retrieval system. Samples are converted into queries to distributed collections. Retrieval is adjusted through sample size and structure, anydata indexing, and dual space feedback. The framework is investigated with a retriev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008